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Abstract 

This paper introduces the recursive sweeping preconditioner for the numerical solution of 
the Helmholtz equation in 3D. This is based on the earlier work of the sweeping preconditioner 
with the moving perfectly matched layers (PMLs). The key idea is to apply the sweeping 
preconditioner recursively to the quasi-2D auxiliary problems introduced in the 3D sweeping 
preconditioner. Compared to the non-recursive 3D sweeping preconditioner, the setup cost 
of this new approach drops from to 0{N), the application cost per iteration drops 

from 0{N log N) to 0{N), and the iteration count only increases mildly when combined with 
the standard GMRES solver. Several numerical examples are tested and the results are com¬ 
pared with the non-recursive sweeping preconditioner to demonstrate the efficiency of the new 
approach. 
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waves. 
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1 Introduction 

Let the domain of interest be the unit cube D = (0,1)^ for simplicity. The time-independent wave 
field u{x) satisfies the Helmholtz equation 

Au(x) -b -^--u{x) = f{x), Vx € D, (1) 

c^[x) 

where w is the angular frequency, c(x) is the velocity field with a bound Cmin < c(x) < c^ax 
where Cmin and Cmax are assumed to be of 0(1), and f(x) is the time-independent external force. 
The typical boundary conditions for this problem are approximations of the Sommerfeld radiation 
condition, which means that the wave is absorbed by the boundary and there is no reflection coming 
from it. Other boundary conditions, such as the Dirichlet boundary condition, can also be specified 
on part of the boundary depending on the modeling setup. 

In this setting, a;/(27r) is the typical wave number of the problem and A = 27r/u; is the typical 
wavelength. For most applications, the Helmholtz equation is discretized with at least a few number 
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of points (typically 4 to 20) per wavelength. So the number of points n in each direction is at least 
proportional to w. As a result, the total degree of freedom N = = V,{uj^) can be very large 

for high frequency 3D problems. In addition, the corresponding discrete system is highly indefinite 
and the standard iterative solvers and/or preconditioners are no longer efficient for such problems. 
These together make the problem challenging for numerical solution. We refer to the review article 
[3] by Ernst and Gander for more details on this. 

Recently in [3, Engquist and Ying developed a sweeping preconditioner using the moving per¬ 
fectly matched layers (PMLs) and obtained essentially linear solve times for 3D high frequency 
Helmholtz equations. A key step of that approach is to approximate the 3D problem with a se¬ 
quence of 0(n) PML-padded auxiliary quasi-2D problems, each of which can be solved efficiently 
with sparse direct method such as the nested dissection algorithm. As an extension, this paper 
applies the sweeping idea recursively to further reduce each auxiliary quasi-2D problem into a se¬ 
quence of PML-padded quasi-lD problems, each of which can be solved easily with the sparse LDU 
factorization for banded systems. As a result, the setup cost of the preconditioner improves from 
(9(]V^/3) ^;q 0{N) and the application cost reduces from 0{N log N) to 0{N). 

There has been a vast literature on iterative methods and preconditioners for the Helmholtz 
equation and we refer to the review articles [S] by Erlangga and [S] by Ernst and Gander for a rather 
complete discussion. The discussion here only touches on the methods that share similarity with the 
sweeping preconditioners. The analytic ILU factorization (AILU) [10] is the first to use incomplete 
LDU factorizations for preconditioning the Helmholtz equation. Compared to the moving PML 
sweeping preconditioner, the method uses the absorbing boundary condition (ABC), which is less 
effective compared to the PML, and hence the iteration count grows much more rapidly. 

Since the sweeping preconditioners jSID were proposed, there have been a number of exciting 
developments for the numerical solutions of the high frequency Helmholtz equation, including but 
not limited to [HI [IH [T8| [T6| [13 [HI [2| [S] |20| . In [T5| , Stolk proposed a domain decomposition 
algorithm that utilizes suitable transmission conditions based on the PMLs between the subdomains 
to achieve a near-linear cost. In Poulson et al discussed a parallel version of the moving 
PML sweeping preconditioner to deal with large scale problems from applications such as seismic 
inversion. In [iii[is[n], Tsuji and co-authors extended the moving PML sweeping preconditioner 
method to other time-harmonic wave equations and more general numerical discretization schemes. 
In [Hj , Vion and Geuzaine proposed a double sweep algorithm, studied several implementations of 
the absorbing boundary conditions, and compared their numerical performance. Finally in [IE], 
Chen and Xiang introduced a sweeping-style domain decomposition method where the emphasis was 
on the source transferring between the adjacent subdomains. In |2()] . Zepeda-Niihez and Demanet 
developed a novel parallel domain decomposition method that uses transmission conditions to define 
explicitly the up- and down-going waves. 

The rest of the paper is organized as follows. We first state the problem and the discretiza¬ 
tion used in Section [^ Section [^ reviews the non-recursive moving PML sweeping preconditioner 
proposed in [ 3 . Section discusses in detail the recursive sweeping preconditioner. Numerical 
results are presented in Section [^ Finally, the conclusion and some future directions are provided 
in Section [6] 

2 Problem Formulation 

Following [3 , we assume that the perfectly matched layer (PML) [3 111 HI] is utilized at part of the 
boundary where the Sommerfeld radiation condition is specified. The sweeping preconditioner in 
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[7] requires that at least one of the six faces of the domain D = (0,1)^ is specified with the PML 
boundary condition. As we shall see soon, the recursive sweeping preconditioner instead requires 
the PML condition to be specified at least at two non-parallel faces. Without loss of generality, we 
assume that it is specified at a ;2 = 0 and x^, = 0. There is no restriction on the type of boundary 
conditions specihed on the other four faces. However, to simplify the discussion, we assume that 
the Dirichlet condition is used. The PML boundary condition introduces auxiliary functions 

— , X e [ 0 , 77 ], 

V \ V J 

0, X G (77,1], 


Sl(x) = l, S 2 {x)=s{x 2 ), S 3 {x)=s{x 3 ), 


c) = 


and 


3(37) = ( 1 -I- 


a{x) 


-1 


where C is an appropriate positive constant independent of w, and 77 is the PML width, which is 
typically around one wavelength. The Helmholtz equation with PML is 

(siai)(si9i) -b {,S 2 d 2 ){s 2 d 2 ) + {S 3 d 3 ){szd 3 ) + u{x) = f (x), WxG D = (0,1)^, 

[ 77 ( 37 ) =0, Vo; S dD. 

It is typically assumed that that the support of f{x) is in (0,1) x ( 77 ,1) x ( 77 ,1), which means that 
the force is not located in the PML region. The cube [0,1]^ is discretized with a Cartesian grid 
where the grid size is h = and n is proportional to w. The set of all the interior points of the 
grid is given by 


P = {Pi,j,k = {ih,jh,kh) ■■ 1 <i,j,k < n}, 


and the degree of freedom is = n^. 

Applying the standard 7-point finite difference stencil results in the discretized system 


T 1 

\ -L / t-r±/ / 

h ^ 

. ( 52)1 j',fc 

f {S2)i,j + l/2,k 

+ h 

V h 


f (s3)iy,fc+i/2 


(■Sl)i-l/ 2 j,fe 


U'i — 


/ ^ (■52)i,j-l/2,fe / s 

t.. .. \ ('®3)i,i,/c-l/2 _ ,, 


CJ 




^i,j,k fi,j,k: '^1 ^ Ji k ^ Tl^ 


( 3 ) 


where the subscript (z, j, k) means that the corresponding function is evaluated at the point = 
(z/i, jh, kh) and the definition of the points here extends to half integers as well. The computational 
task is to solve ([^ efficiently. We note that, unlike the symmetric version adopted in Em, here the 
nonsymmetric version of the equation is used. Figure [^provides an illustration of the computational 
domain and the discretization grid. 
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Figure 1: The domain of interest. Left is a 3D view of the domain. Right is an X 2 -X 3 cross section 
view, where each cell stands for a ID column. The gray area stands for the PML region. 


3 Review of the Sweeping Preconditioner with Moving PML 


This section gives a brief review of the non-recursive moving PML sweeping preconditioner proposed 
in [7] for completeness. More details can be found in the original paper [7]. The starting point of 
the sweeping preconditioner is a block LDU factorization called the sweeping factorization. To build 
this factorization, the algorithm sweeps along the X 3 direction starting from the face X 3 = 0. The 
unknowns with subscript index (i,j,k) are ordered with column-major order, i.e., first dimension 
1, then dimension 2, and finally dimension 3. We define the vectors 


f ~ • ■ ■ ; • ■ ■ ; ; fn,n,n] • 


By introducing 

as the points on the m-th plane and also 

fm — [/*!,!,m? • ■ • ; ; fn,n,m\ 5 

one can write the system ([^ compactly as Au = f with the following block form 

Ai^i Ai,2 

A2,1 A2^2 

An—1,‘ 

A , A 





7:,:.r 


W:,:,2 

= 



1 

1_ 




(4) 
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By defining Sk and Tk recursively via 


Si=Ai^i, Ti = S^\ 

Sm — ^m,m l^m—l^m—l,m5 ^ — ^5 ■ 


, ,n, 


the standard block LDU factorization of the block tridiagonal matrix A is 


■5i 


A — Li... Ln-i 




where Lm and Um are the corresponding unit lower and upper triangular matrices with the only 
non-zero off-diagonal blocks 


7 -fm) — : fbn (-^m : 7 ^ — 1 , . . . , TT 1 . 


It is not difficult to see that computing this factorization takes steps. Once it is available, 

u can be computed in 0 {N^/^) steps by 


u = 


= A-^f = [/fi. 

■■U-\ 

1 







-1 


The main disadvantage of the above algorithm is, Sm and Tm are in general dense matrices of 
size x so the corresponding dense linear algebra operations are expensive. The sweeping 
preconditioner overcomes this difficulty by approximating efficiently for with mh € (ry, 1], 
i.e., for Pm not in the PML region at the face X 3 = 0. The key point is to consider the physical 
meaning of Tm- From now on let us assume rj = bh which implies that there are b layers in the 
PML region at X 3 = 0. Restricting the factorization to the upper-left m x m block of A where 
m = b + 1 ,... ,n gives 


- 1 

to 

_ 1 


1 — 

^2,1 ^2,2 

— Li ... Ljjyi—\ 

^2 




A , A 


Sra_ 


where Lt and Ut are redefined by restricting to their upper left mx m blocks. Inverting both sides 
leads to 


^1,1 

_1 

-1 


Ti 

1 

^2,1 

^2,2 

A.jn—l,7n 


■■Um\ 

T2 

T 

^ m_ 


A . A 

1 -^m^m 



- 


The left-hand side is the discrete half-space Green’s function with Dirichlet zero boundary condition 
at X 3 = {m+ l)h and a straightforward calculation shows that the lower-right block of the right- 
hand side is Tm- Therefore, Tm is the discrete half-space Green’s function restricted to the m-th 
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layer. Note that, the PML at 3:3 = 0 is used to simulate an absorbing boundary condition. If 
we assume that there is little reflection during the transmission of the wave, we can approximate 
Tm by placing the PML right next to the m-th layer since the domain of interest is only the m-th 
layer (see Figure]^. This is the key idea of the moving PML sweeping preconditioner, where the 
operator Tm is numerically approximated by putting the PML right next to the domain of interest 
and solving a much smaller system to save the computational cost. 




Figure 2: Left: Tm is the restriction to Pm (the dashed grid) of the half space Green’s function 
on the solid grid. Right: By moving the PML right next to the layer Pm, the operator Tm is 
approximated by solving the equation on a much smaller grid. 

More precisely, we introduce an auxiliary problem on the domain Dm = [0,1] x [0,1] x [(m — 
b)h, (m + l)/i] : 

(sii9i)(sii9i) + {S 2 d 2 ){s 2 d 2 ) + (s™93)(sr^3) + ^( 2 ^) = 9{x), Va; e Dm, 

[u(a:) = 0, Vx S dDm, 

where sjp(x) = s(x 3 — (m — b)h). The domain Dm is discretized with the partial grid 

P(m — b-\-l)'.m •— ^ & T 1 ^ t ^ TTi}. 

Applying the same central finite difference scheme gives rise to the corresponding discretized system, 
denoted as 


HmV = g, m = b+l,...,n. 

To approximate Tm, we numerically define operator : a € C"^ —> /3 G C"^ by the following 

procedure: 


6 




















































































1. Introduce a vector g defined on P^m-b+iy.m by setting a to the layer and zero everywhere 
else. 


2. Solve the discretized auxiliary problem HmV = g on P(rn-b+i)-.m with g from step 1. 

3. Set /3 as the restriction on Pm of the solution v from step 2. 

The discretized system is a quasi-2D system as b is typically a small constant, so the system can 
be solved efficiently by the nested dissection method dnisiiis]. 

The first b layers, which are in the PML region of the original problem ([^, need to be handled 
with a slight difference. Define 


f-.,-.,l-.b = 




Then the system Au = / can be written as 


^1:5,1:6 ^1:6,6+1 




/:.:,1:6 

^6+1,1:6 ^6+1,6+1 


6 -1-1 


6 -1-1 

An—l^n 





A A 

■^n,n—1 -^n,n 


_ _ 


. _ 


For the first b layers, we simply define \Ti±\ as the inverse operator of Plb := However, it 

is essential that [TiibJ is stored in a factorized form by applying the nested dissection method to 
Pf), since PlhV = g is also a quasi-2D problem. 

Based on the above discussion, the setup algorithm of the moving PML sweeping preconditioner 
is given in Algorithm]^ 


Algorithm 1 Construction of the moving PML sweeping preconditioner of the system (§. Com¬ 
plexity = 0{b^rA) = 0{b^N^/^). 

Construct the nested dissection factorization of Hb, which defines [TiibJ. 
for m = & -I- 1, ..., n do 

Construct the nested dissection factorization of Hm, which defines [T^J. 

end for 


Once the factorization is completed, and can be applied using the nested dissection 

factorization. The application process of the sweeping preconditioner is given in Algorithm 


4 Recursive Sweeping Preconditioner 

Recall that the PML is also applied to the face X 2 = 0. Therefore, each quasi-2D auxiliary problem 
is itself a discretization of the Helmholtz equation with the PML specified on one side. Following 
the treatment in [7] for the 2D Helmholtz equation, it is natural to apply the same sweeping idea 
once again along the X 2 direction, instead of the nested dissection algorithm used in the previous 
section. 
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Algorithm 2 Computation of m « A using the factorization from Algorithm Complexity 
= OQPn^logn) = 0{b^ N log N). 

= [Ti-,b\ 

W:,:,b+1 = [Tb+l\{f-.,-.,b+l — ^b+l,l-.bU-.^-.^l:b) 

for m = &+l,...,n—1 do 

m+1 -^m+1 ,m'^:,m) 

end for 

for m = n — 1,... ,b + 1 do 

L^mJ 

end for 

— L^l:hJ (^1:6,&+1^:,:,b+l ) 


4.1 Inner sweeping 


Recall that the quasi-2D subproblems of the non-recursive sweeping preconditioners are H^v = 
g,m = b^... ,n. Since they have essentially the same structure, it is sufficient to consider a single 
system Av = g where A can be anyone of Here the accent mark is to emphasize that the 

problem under consideration is quasi-2D. To formalize the sweeping preconditioner along the X 2 
direction, we define, up to a translation, 

P = {pij^k = {ih, jh,kh) : 1 < z, J < n, 1 < fc < &}, 
to be the discretization grid. For each m = 1,..., n, let 

• • ■ ^Pl,7n,bi ■ • • t Pn,m,b}: 

- j : '^n,m,b] : 

i 9l,m,b: ■ ■ ■ : 17n,m,b] 


For the first b layers in the X 2 direction, we also define 


Pi-.b = {Pi, ■ ■ ■,Pb}, 

rp ji yjn 

r T T ^T 

9-,1-b,-. = [g-.,l,9:,bj ■ 


In this section, we reorder the vectors v,g hy grouping the 3rd dimension first and applying the 
column-major ordering to dimensions 1 and 2: 




r r T il 

9= [9 :,1 9:,nj 


With this ordering, the corresponding system Av = 9 is written as 


^ 1 : 5 , 1:6 ^ 1 : 6 , 6+1 


V,,l:b,: 


5 :. 1 : 6 ,: 

^ 6 + 1 , 1:6 ^ 6 + 1 , 6+1 


V-.,b+l.,: 

_ 

5 :, 6 - 1 - 1 ,: 

An—l^n 





A 1 A 

L -^n,n —1 -^n,n J 




5 :,n,: 
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For the block LDU factorization of A, we define 


Sl:h = T’l-J, = 

Sb +1 = ^6+1,6+1 — blb+l,1:621:6^1:6,6+1, 21+1 = 'S'j+'^l, 

‘^m — ^m,m —l21n—l^m—l,m, ^ ^ “t“ 2, . . . , 72, 

then A can be factorized as 

[Si,b 


A — Li:bLb+l ■ . ■ Ln-l 


■S'b+i 


Sn 


Un-1 ■ ■ ■ Ub+lUi-,b, 


where the non-zero off-diagonal blocks of the unit lower and upper triangular matrices Lm and Um 
are given by 


Ll-.b{Pb+l, Pl-.b) = Ab+l^l:bTl.,b, Ul-,b{Pl:b,Pb+l) = Tl-.bAl:b,b+l, 

Lm (-fm+1 21n+l) — PmAm^m+1^ "bJl — 6 -j- 1, . . . , 71 1. 

Then the solution v can be computed by 


Pn-l ■ ■ ■ Pb+l^l:b9- 


V:,l:6,: 


'Tl,b 


V:,6+l,: 

= A-^9 = UZlUb^---Un\ 

21+1 





21 . 


By comparing the factorization of the upper-left (m — & -I- 1) x (m — & -I- 1) block of A, where 
m = b + 1 ,... ,n, we have 


^1:6,1:6 

^1:6,6+1 




'Si-.b 

- 

^6+1,1:6 

^6+1,6+1 

A 1 A 

1 -^m,m J 

— Li-.bLb+i ■ 

• —1 

Sb+i 

Sm- 


Um—1 ■ ■ ■ Ub+lPl:b: 


where Lt and Ut are redefined as their restrictions to their top-left (m — 6-1-1) x (771-6-1-1) blocks. 
Inverting both sides gives 


^1:6,1:6 

^1:6,6+1 


-1 


'Tl:b 

1 

^6+1,1:6 

^6+1,6+1 

A 1 A 

1 -^m,m J 

= ^+6 f^6+l • 

■U-\ 

21+1 

T 

mJ 


Thus, by repeating the argument in Section the matrix Tm is the restriction to the layer 
of the discrete half-space Green’s function. It can be approximated by [2i^J, which is defined by 
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solving a quasi-lD problem obtained by placing a moving PML right next to = mh (see Figure 
[^. Each auxiliary quasi-lD problem in this inner sweeping step can be solved by the sparse block 
LDU factorization efficiently, with ordering the system by grouping dimension 3 and 2 first and 
dimension 1 last. 



Figure 3: Left: is the restriction to Pm (the dashed grid) of the Green’s function on the quasi- 

2D solid grid. Right: By moving the PML right next to Pm, the operator Tm is approximated by 
solving the problem on a quasi-lD grid. 


More specifically, for each to, we introduce the auxiliary problem on the domain Dm = [0,1] x 
[(to — b)h, (to + l)/i] X [0, (6 + l)h]: 

{sidi){sidi) + (s™ 52 )(s™ 92 ) + is 3 d 3 ){s 3 d 3 ) + w{x) = q{x), Wx S Dm, 

I ri;(a:) =0, Vx S dDm, 

where s^(x) = s(x 2 — (to — h)h). The domain Dm is discretized with the grid 

P[m — h+l):m ' Tfl 6+1 + t + TO.}, 

and the same central difference numerical scheme is used here. We denote the corresponding 
discretized^system as HmW = q. Similar to the process described in Section we define the 
operator : ol S C”^ —t /3 G by the following procedure: 

1 . Introduce a vector q defined on the grid P(m-b+i):m by setting a to the layer Pm and zero 
everywhere else. 

2 . Solve the auxiliary quasi-lD problem HmW = q on P(m-b+i)-.m with q from step 1. 

3. Set P as the restriction on Pm of the solution w from step 2. 
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For the first b layers, is simply defined as the inverse operator of Hf, := which 

i^ essentially the same as Ti:h, but implemented by using the sparse block LDU factorization of 
Hb- Summarizing all this, the setup and application algorithm of the inner moving PML sweeping 
preconditioner are given in Algorithms and respectively. 


Algorithm 3 Construction of the inner moving PML sweeping preconditioner of the quasi-2D 
problem Av = g. Complexity = 0{b^ii?). 

Construct the sparse block LDU factorization of Hb, which defines [Tii&J. 
for m = & + 1, ..., n do 

Construct the sparse block LDU factorization of Lfm, which defines [TrnJ- 

end for 


Algorithm 4 Computation of w « A using the factorization from Algorithm Complexity 
= 0{b^n‘^). 

= [T^b\g-.,i-.b,-. ^ 

i’-.,b+i,-. = [Tb+i\ig-.,b+i,-. — Ab+i^i-bV-^i-b,-) 

for m = & + , n — 1 do ^ 

end for 

for m = n — 1 ,..., l^o 

L^mJ (^ 77 ^^ 772 + 1 *^:.m+l,: ) 

end for 

V-.A-.b,-. = '^ 1 , 1 : 6 ,: — \ Tl-.b\(Ai.,b^b+lV,,b+l,-) 


4.2 Putting together 

As we pointed out earlier, the matrix A can be anyone of Hm,m = b^... ,n, where Algorithms 
and can be applied. Notice that solving the subproblems exactly with the nested dissection 
algorithm results in the approximation [Tml to T^. This extra-level of approximation defines a 
further approximation, which shall be denoted by [[TjtjJJ : a G C" —>■ /3 S C" (to be precise, 
for the first b layers, it is [[T'i:6jj : a G -G P G C"^^). The steps for carrying out [[TttiJJ are 

similar to the ones for except that one uses Algorithms|^and[^to solve the quasi-2D problems 
approximately (instead of the nested dissection method that solves them exactly). 

Given all these preparations, the setup algorithm of the recursive sweeping preconditioner can 
be summarized compactly in Algorithm and the application algorithm is given in Algorithm 

In the outer loop of Algorithm]^ the unknowns are eliminated layer by layer in the 0:3 direction. 
In the application of [[TttjJJ , there is the inner loop in which the unknowns in each quasi-2D problem 
are eliminated in the X 2 direction. The whole algorithm serves as a preconditioner for the original 
linear system (§. Notice that, in the recursive sweeping preconditioner, the quasi-2D problems 
are solved only approximately. Therefore, the overall accuracy might not be as good as the non¬ 
recursive method. But as we will show in the next section, the performance of the preconditioner 
is only mildly affected. 
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Algorithm 5 Construction of the recursive moving PML sweeping preconditioner of the linear 
system (j- Complexity = 0{b^n^) = 0{b^N). 

Construct the inner moving PML sweeping preconditioner of H}, by Algorithm This gives 
for m = & + 1,..., n do 

Construct the inner moving PML sweeping preconditioner of Hm by Algorithm This gives 

iTml 

end for 


Algorithm 6 Computation of m « A using the factorization from Algorithm Complexity 
= 0(64^3) = 0[b^N). 

f ,^,^ 1 , 1 , 

for m = &+l,...,n—1 do 
end for 

for m = n— 1 ,..., 6+1 do 
end for 

— [[TlibJJ 


The above algorithms are described in a way to present the main ideas clearly. In the actual 
implementations, a couple of modifications are taken in order to maximize the efficiency: 

1. For each auxiliary problem, both in the inner loop and the outer loop, several layers are 
processed together instead of one layer. 

2. For the PML introduced in the auxiliary problems, the number of layers in the auxiliary PML 
region does not have to match the number of layers b used for the boundary PML at 2:2 = 0 
and X 3 = 0. In fact, the thickness of the auxiliary PML is typically thinner for the sake of 
efficiency. 

3. The problem we described above has zero Dirichlet boundary conditions on the other four faces 
of the cube. If instead, the PMLs are put on all the faces, then the sweeping preconditioner 
sweeps with two fronts from two opposite faces respectively and they meet in the middle with 
a subproblem with PML on both sides instead of only one side, as described in [7] . 

5 Numerical Results 

In this section we test several numerical examples to illustrate the performance of the recursive 
sweeping preconditioner. All algorithms are implemented in MATLAB and the tests are performed 
on a 2.0 GHz computer with 256 GB memory. We use the GMRES algorithm as the iterative solver 
with relative residual 10“3 and restart value 40 for the entire 3D system. The quasi-2D problems 
are solved approximately by applying the inner sweeping preconditioner only once for the sake of 
efficiency. The velocity fields and forces tested are kept the same with [7] so that the results can be 
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compared easily. The PMLs are put on all six sides of the cube [0,1]^ to simulate the Sommerfeld 
radiation condition. 

We test three velocity fields (see Figure]^: 

(a) A converging lens with a Gaussian profile at the center of the domain. 

(b) A vertical waveguide with a Gaussian cross-section. 

(c) A random velocity field. 



(a) (6) (c) 

Figure 4: The three velocity fields tested. 


For each velocity field, the tests are performed for two external forces: 

(a) A Gaussian point source centered at (1/2,1/2,1/4). 

(b) A Gaussian wave packet with wavelength comparable to the typical wavelength of the domain. 
The packet centers at (1/2,1/4,1/4) and points to the direction (0, l/-\/2, l/-\/2). 

We vary the typical wave number a;/(27r), test the behavior of the recursive preconditioner, and 
compare the results with the non-recursive preconditioner. 

In these tests, each wavelength is discretized with q = 8 points. The width of the PML at the 
boundary of the cube is 9h, and the width of the auxiliary PML for the middle layers is 5h. The 
number of layers processed in each auxiliary problem is 4. The algorithm sweeps with two fronts 
from 0:3 = 0 and 0:3 = 1 in the outer loop, and with two fronts from X2 = 0 and 0:2 = 1 in the inner 
loop. 

The results are reported in the following tables. Tgetup is the time used to construct the pre¬ 
conditioner in seconds. Tsoive is the time used to solve the system in the preconditioned GMRES 
solver in seconds and Niter is the corresponding iteration number. “NR” stands for the original 
non-recursive method while “R” stands for the recursive method introduced in this paper. The 
“ratio” is the time cost of the recursive method over the non-recursive method. The numerical im¬ 
plementation of the non-recursive method is slightly improved as compared to [7] , by incorporating 
a more accurate PML discretization. Therefore, the results here for the non-recursive method are 
better compared to the ones in [7]. 

From these tests we can make the following observations: 

1. The setup time of the recursive preconditioner is significantly dropped compared to the non¬ 
recursive one. The advantage becomes more and more obvious when the problem size gets 
larger. This is because the setup cost of the recursive method scales like 0{N) and the 
non-recursive one scales like 0 {N‘^/^). 
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^setup 


Aiter 

^solve 

uj/{2tt) q N 

NR R ratio 

fix) 

NR R 

NR R ratio 

8 8 63^ 

46.923 19.823 42% 

(a) 

ib) 

3 4 

4 4 

12.313 16.355 133% 
14.973 16.862 113% 

16 8 127^ 

537.12 180.99 34% 

(a) 

ib) 

3 4 

4 4 

116.44 169.34 145% 
150.67 168.65 112% 

32 8 255^ 

5927.0 1308.0 22% 

(a) 

ib) 

4 5 

4 5 

1273.0 2039.8 160% 
1312.1 2070.4 158% 


Table 1: Results for velocity field (a) in Figure]^ Solutions with w/(27r) = 16 at xi = 0.5 are 
presented. 


2. The iteration number of the recursive preconditioner increases only slightly compared to 
the non-recursive one. Typically it needs about 1 more iteration. This is mainly because 
the recursive method solves the quasi-2D auxiliary problems approximately while the non¬ 
recursive one solves them accurately. 

3. The application time of the recursive sweeping preconditioner is not as fast as the non-recursive 
method, due to a larger prefactor of the time complexity. However, we think this sacrifice is 
acceptable since a huge amount of time is saved in the setup process. One thing that needs 
to be pointed out is that the ratio of the solve time increases as the problem size increases, 
which seems to be unexpected since the solve time of the recursive method scales like 0{N) 
and the non-recursive one scales like 0{N log N). The reason of this behavior is, when the 
problem size increases, the size of the dense linear algebra operations increases as well in 
the non-recursive method since the front size in the nested dissection method gets larger, 
while in the recursive method, the size of the dense linear algebra operations in the quasi-ID 
block LDU setup process and solve process is kept the same. Since MATLAB processes large 
scale dense linear algebra operations in a parallel way, the non-recursive method gains some 
advantages from this. 

The memory cost of the recursive method is also advantageous. In our implementation, the 
recursive method costs only 30% memory compared to the non-recursive method in the N = 255^ 
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^setup 


Niter 

^solve 

uj/{2tt) q N 

NR R ratio 

fix) 

NR R 

NR R ratio 

8 8 63^ 

48.855 18.490 38% 

(a) 

ib) 

3 4 

3 5 

11.249 15.996 142% 
11.048 19.581 177% 

16 8 127^ 

524.61 163.16 31% 

(a) 

ib) 

4 5 

3 5 

152.32 212.59 140% 
111.71 213.24 191% 

32 8 255^ 

6038.8 1319.0 22% 

(a) 

ib) 

5 6 

4 5 

1676.7 2471.9 147% 
1345.3 2084.6 155% 


Table 2: Results for velocity field (b) in Figure]^ Solutions with uj/{2tt) = 16 at xi = 0.5 are 
presented. 


case. Theoretically, the memory cost of the recursive method scales like 0{N) while the non¬ 
recursive one scales like 0{Nlog{N)). This is another main advantage of the recursive method. 

6 Conclusion and Future Work 

In this paper, we introduced a new recursive sweeping preconditioner for the 3D Helmholtz equation 
based on the moving PML sweeping preconditioner proposed in [7]. The idea of the sweeping 
preconditioner is used recursively for the auxiliary quasi-2D problems. Both the setup cost and 
application cost of the preconditioner are reduced to strict linear complexity. The iteration number 
remains essentially independent of the problem size when combined with the standard GMRES 
solver. Numerical results show that the setup time drops significantly compared to the non-recursive 
method, while the solve cost increases only slightly. 

Several questions still remain open and some potential improvements can be made. First, we 
use the PML to simulate the Sommerfeld condition. Many other simulations of the absorbing 
boundary condition can be implemented and the recursive sweeping idea can be used as long as the 
stencil of the simulation is local. Second, the numerical scheme used in this paper is the standard 
central difference scheme, whose dispersion relationship is a poor approximation of the true one. 
More accurate numerical schemes can be implemented and the iteration number may be dropped 
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^setup 


Niter 

^solve 

w/(27r) q N 

NR R ratio 

f{x) 

NR R 

NR R ratio 

8 8 63^ 

48.949 18.213 37% 

(a) 

(&) 

4 4 

4 4 

16.487 16.252 99% 

15.747 16.513 105% 

16 8 127^ 

553.70 147.00 27% 

(а) 

(б) 

4 4 

5 5 

158.41 170.59 108% 

196.91 215.84 110% 

32 8 255^ 

6024.1 1299.2 22% 

(а) 

(б) 

5 5 

5 6 

1599.6 1929.1 121% 
1635.2 2314.5 142% 


Table 3: Results for velocity field (c) in Figure]^ Solutions with uj/{2'it) = 16 at Xi = 0.5 are 
presented. 


potentially benefiting from the increment of the accuracy of the numerical scheme. 

Parallel processing can also be introduced to the current recursive method. First, When sweeping 
from both sides of the domain, either in the outer loop of the algorithm or in the inner loop, the 
processing of the two fronts can be paralleled so in total it could be 4 times faster with parallelization 
theoretically. Second, the quasi-lD problems are solved by the block LDU factorization in the 
current setting. If instead, we use the ID nested dissection algorithm for the quasi-lD problems, 
then it can be easily paralleled and the total cost will remain essentially the same. Last, one can 
notice that, the setup process of the algorithm is essentially O(n^) quasi-lD subproblems which are 
independent with each other so this process can be done in parallel, and compared to the original 
method, which contains only 0{n) quasi-2D independent subproblems, the potential advantages of 
parallelization in the setup stage is more obvious here. 

There are also several other advantages of the recursive sweeping method that concern flexibility. 
First, as mentioned above, the setup process contains O(n^) quasi-lD independent subproblems. 
So if the velocity field is modified on a subdomain which involves only limited subproblems, then 
the factorization can be updated with only a slight modification on these involved subproblems. 
Compared to the original method, where the subproblems are 0{n) quasi-2D plates, the recursive 
method is more flexible on updating the factorization. This could be advantageous in seismic 
imaging where the velocity field is tested and modified frequently. Second, when the factorization 
for the O(n^) subproblems is done, there are naturally two ways of using the factorization. One is. 
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as mentioned in this paper, sweeping along the X 3 direction in the outer loop, and sweeping along 
the xi direction in the inner loop. Another choice is to do the opposite, which is sweeping along 
the X 2 direction in the outer loop and the X 3 direction in the inner loop. Each of these two choices 
shows some “bias” since the residual of the system is accumulated in some “chosen” order. So one 
may ask that, is it possible to combine the two choices together to make the solve process more 
flexible such that the total solve time can be even less? This is another interesting question to be 
examined. 
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